Automatic client side seamless failover

ABSTRACT

A standby database cluster takes on the role of the primary database cluster if the primary database cluster becomes unavailable using the following steps: (i) operating a database management system (DBMS) including an initial primary cluster and a plurality of standby clusters; (ii) communicating to a set of client driver(s) connecting a first application to the initial primary cluster an identity of the plurality of standby clusters; (iii) on condition that the initial primary cluster becomes unavailable, assigning a selected standby cluster of the plurality of standby clusters to be assigned as a new primary cluster in place of the initial primary cluster; and (iv) in response to assignment of the new primary cluster, seamlessly moving the first application from the initial primary cluster to the new primary cluster without any substantial human intervention.

The following disclosure(s) are submitted under 35 U.S.C. 102(b)(1)(A)as prior disclosures by, or on behalf of, a sole inventor of the presentapplication or a joint inventor of the present application:

“DB2 Version 10.5 Fix Pack 2 for Linux, UNIX, and Windows”, IBM, ReleaseDate: 10 Oct. 2013.

BACKGROUND OF THE INVENTION

The present invention relates generally to the field of databasefailover and more particularly to database failover in systems withdatabase management systems (DBMS) clusters.

In the field of computer science and computing, failover is defined asswitching to a standby or redundant computer server, computer system,hardware component or computer network upon the failure or abnormaltermination of a previously active application, server, system, network,or hardware component. Failover is typically applied automatically andusually operates without warning. Designers of computer systemstypically provide failover capability in servers, systems or networksthat require continuous availability. Likewise, failback is the processof restoring a system, network, service, or component, which is in astate of failover, back to its original state before the failureoccurred.

At a server level, failover automation typically uses a physicalconnection between two (2) servers. As long as a connection remainsbetween the main server and the second server, the second server willnot initiate, or turn on, its systems. There may also be a third serverthat has running spare components for “hot switching” to preventdowntime. The second server takes over the work of the first server assoon as it detects an alteration in the connection of the first server.In addition, some systems have the ability to send a notification offailover.

Clustering is one of the common technologies adopted by DBMS (databasemanagement system) companies to obtain continuous database availability.Each cluster (herein also known as a group) consists of multipledatabase servers (also known as members). An advanced clusteringconfiguration involves the existence of multiple clusters, where onecluster is active (called the primary), and the members within thatcluster are responsible for servicing all applications with activetransactions distributed among the members, according to differentworkload balancing algorithms. The remaining clusters are on standby andwill take over the role of the primary cluster, only in the event theprimary cluster goes down.

SUMMARY

According to an aspect of the present invention, there is a method,computer program product and/or system that performs the following steps(not necessarily in the following order): (i) operating a databasemanagement system (DBMS) including an initial primary cluster and aplurality of standby clusters; (ii) communicating to a set of clientdriver(s) connecting a first application to the initial primary clusteran identity of the plurality of standby clusters; (iii) on conditionthat the initial primary cluster becomes unavailable, assigning aselected standby cluster of the plurality of standby clusters to beassigned as a new primary cluster in place of the initial primarycluster; and (iv) in response to assignment of the new primary cluster,seamlessly moving the first application from the initial primary clusterto the new primary cluster. In some embodiments, the seamless movementis performed without any substantial human intervention (that is, withlittle, or no, human intervention).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic view of a first embodiment of a system accordingto the present invention;

FIG. 2 is a flowchart showing a method performed, at least in part, bythe first embodiment system;

FIG. 3 is a schematic view of a machine logic (for example, software)portion of the first embodiment system;

FIG. 4 is a screenshot view generated by the first embodiment system;and

FIG. 5 is a schematic view of a second embodiment of a system accordingto the present invention.

DETAILED DESCRIPTION

In some embodiments of the present invention a standby database clustertakes on the role of the primary database cluster if the primarydatabase cluster becomes unavailable. In some embodiments, applicationsmove to the standby cluster without any substantial human interventionand without outage. In some embodiments, client drivers are configuredin such a manner that when an application running on the primary clustergoes down, the client driver immediately cycles through the alternateclusters until another cluster is found to take on the role of the newprimary cluster. This Detailed Description section is divided into thefollowing sub-sections: (i) The Hardware and Software Environment; (ii)Example Embodiment; (iii) Further Comments and/or Embodiments; and (iv)Definitions.

I. The Hardware and Software Environment

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

An embodiment of a possible hardware and software environment forsoftware and/or methods according to the present invention will now bedescribed in detail with reference to the Figures. FIG. 1 is afunctional block diagram illustrating various portions of networkedcomputers system (also sometimes referred to as “database system”) 100,including: database management sub-system 102; data storage clustersub-systems (or, more simply, clusters) 104, 106, 108, 110; clientsub-system 112; communication network 114; database management computer200; communication unit 202; processor set 204; input/output (I/O)interface set 206; memory device 208; persistent storage device 210;display device 212; external device set 214; random access memory (RAM)devices 230; cache memory device 232; and program 300.

Sub-system 102 is, in many respects, representative of the variouscomputer sub-system(s) in the present invention. Accordingly, severalportions of sub-system 102 will now be discussed in the followingparagraphs.

Sub-system 102 may be a laptop computer, tablet computer, netbookcomputer, personal computer (PC), a desktop computer, a personal digitalassistant (PDA), a smart phone, or any programmable electronic devicecapable of communicating with the client sub-systems via network 114.Program 300 is a collection of machine readable instructions and/or datathat is used to create, manage and control certain software functionsthat will be discussed in detail, below, in the Example Embodimentsub-section of this Detailed Description section.

Sub-system 102 is capable of communicating with other computersub-systems via network 114. Network 114 can be, for example, a localarea network (LAN), a wide area network (WAN) such as the Internet, or acombination of the two, and can include wired, wireless, or fiber opticconnections. In general, network 114 can be any combination ofconnections and protocols that will support communications betweenserver and client sub-systems.

Sub-system 102 is shown as a block diagram with many double arrows.These double arrows (no separate reference numerals) represent acommunications fabric, which provides communications between variouscomponents of sub-system 102. This communications fabric can beimplemented with any architecture designed for passing data and/orcontrol information between processors (such as microprocessors,communications and network processors, etc.), system memory, peripheraldevices, and any other hardware components within a system. For example,the communications fabric can be implemented, at least in part, with oneor more buses.

Memory 208 and persistent storage 210 are computer-readable storagemedia. In general, memory 208 can include any suitable volatile ornon-volatile computer-readable storage media. It is further noted that,now and/or in the near future: (i) external device(s) 214 may be able tosupply, some or all, memory for sub-system 102; and/or (ii) devicesexternal to sub-system 102 may be able to provide memory for sub-system102.

Program 300 is stored in persistent storage 210 for access and/orexecution by one or more of the respective computer processors 204,usually through one or more memories of memory 208. Persistent storage210: (i) is at least more persistent than a signal in transit; (ii)stores the program (including its soft logic and/or data), on a tangiblemedium (such as magnetic or optical domains); and (iii) is substantiallyless persistent than permanent storage. Alternatively, data storage maybe more persistent and/or permanent than the type of storage provided bypersistent storage 210.

Program 300 may include both machine readable and performableinstructions and/or substantive data (that is, the type of data storedin a database). In this particular embodiment, persistent storage 210includes a magnetic hard disk drive. To name some possible variations,persistent storage 210 may include a solid state hard drive, asemiconductor storage device, read-only memory (ROM), erasableprogrammable read-only memory (EPROM), flash memory, or any othercomputer-readable storage media that is capable of storing programinstructions or digital information.

The media used by persistent storage 210 may also be removable. Forexample, a removable hard drive may be used for persistent storage 210.Other examples include optical and magnetic disks, thumb drives, andsmart cards that are inserted into a drive for transfer onto anothercomputer-readable storage medium that is also part of persistent storage210.

Communications unit 202, in these examples, provides for communicationswith other data processing systems or devices external to sub-system102. In these examples, communications unit 202 includes one or morenetwork interface cards. Communications unit 202 may providecommunications through the use of either or both physical and wirelesscommunications links. Any software modules discussed herein may bedownloaded to a persistent storage device (such as persistent storagedevice 210) through a communications unit (such as communications unit202).

I/O interface set 206 allows for input and output of data with otherdevices that may be connected locally in data communication with servercomputer 200. For example, I/O interface set 206 provides a connectionto external device set 214. External device set 214 will typicallyinclude devices such as a keyboard, keypad, a touch screen, and/or someother suitable input device. External device set 214 can also includeportable computer-readable storage media such as, for example, thumbdrives, portable optical or magnetic disks, and memory cards. Softwareand data used to practice embodiments of the present invention, forexample, program 300, can be stored on such portable computer-readablestorage media. In these embodiments the relevant software may (or maynot) be loaded, in whole or in part, onto persistent storage device 210via I/O interface set 206. I/O interface set 206 also connects in datacommunication with display device 212.

Display device 212 provides a mechanism to display data to a user andmay be, for example, a computer monitor or a smart phone display screen.

The programs described herein are identified based upon the applicationfor which they are implemented in a specific embodiment of theinvention. However, it should be appreciated that any particular programnomenclature herein is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

II. Example Embodiment

FIG. 2 shows flowchart 250 depicting a method according to the presentinvention. FIG. 3 shows program 300 for performing at least some of themethod steps of flowchart 250. This method and associated software willnow be discussed, over the course of the following paragraphs, withextensive reference to FIG. 2 (for the method step blocks) and FIG. 3(for the software blocks).

Processing begins at step S255, where initial assignment module (“mod”)302 assigns the clusters 104, 106, 108, 110 (see FIG. 1) to theirrespective roles. In this example, cluster 104 is assigned as theinitial primary cluster role, and clusters 106, 108 and 110 (that is,the original standby clusters) are assigned to standby roles.

Processing proceeds to step S260, where normal operations mod 304operates the database system so that initial primary cluster 104 is usedas a database by applications running at client 112.

Processing proceeds to step S265, where detect unavailability mod 306determines that the primary cluster has become unavailable. The mannerin which primary cluster unavailability is detected will be furtherdiscussed, below, in the Further Comments And/Or Embodiment(s)sub-section of this Detailed Description section.

Processing proceeds to step S270, where select new primary mod 308selects the new primary cluster from the original set of standbyclusters 106, 108, 110 (see FIG. 1). The manner in which the new primarycluster is selected will be further discussed, below, in the FurtherComments And/Or Embodiment(s) sub-section of this Detailed Descriptionsection. In this example, cluster 110 is selected as the new primarycluster.

Processing proceeds to step S275, where seamless movement (seedefinition, below, in the Definitions sub-section of this DetailedDescription section) mod 310 seamlessly moves the database accessingapplications running on client 112 (see FIG. 1) from initial primarycluster 104 to new primary cluster 110. In this embodiment, a humandatabase manager (not shown in the Figures) is notified of the switchthrough display device 212 (see FIG. 1), as shown at window 402 ofscreenshot 400 of FIG. 4.

III. Further Comments and/or Embodiments

Some embodiments of the present invention recognize the following facts,potential problems and/or potential areas for improvement with respectto the current state of the art: (i) existing sever side failovertechniques using cluster (herein also known as a group) technologyexhibit the limitation that only “new incoming” connections failover tothe database cluster that has assumed the primary database role; (ii)connections that are servicing “existing applications” will return aconnection failure message; and/or (iii) a side effect of a takeovermechanism is that “already connected” applications suffer connectionfailure leading to downtime.

Some embodiments of the present invention may include one, or more, ofthe following features, characteristics and/or advantages: (i) when adatabase takeover occurs with multiple database clusters (that is, astandby cluster takes on the role of the primary cluster if the primarycluster becomes unavailable) applications move seamlessly to the newstandby database cluster without any intervention; (ii) when a databasetakeover occurs with multiple database clusters (that is, a standbycluster takes on the role of the primary cluster if the primary clusterbecomes unavailable) applications move seamlessly to the new standbydatabase cluster without any outage; (iii) the client drivers connectingthe applications to the DBMS (database management system) cluster areconfigured with knowledge of the alternate clusters; (iv) when anapplication in the primary cluster goes down (none of its members areavailable), the client driver cycles through the alternate clustersuntil the software finds a cluster that can now become the new primarycluster, without returning a connection failure to the application; (v)any “in progress” transactions are failed over to a member within thenew primary cluster, as long as conditions for safe failover aresatisfied (for example, first SQL (structured query language) of atransaction); (vi) the choice of the member within the new primarycluster is determined by the workload balancing algorithm being used;and/or (vii) utilizing the automatic client side failover technique formultiple clusters, new connections, as well as existing connectionsexperience little or no downtime or connection failure. Morespecifically with respect to item (v) on the foregoing list, in someembodiments, only SQL queries at a unit of work boundary (i.e. first SQLin a transaction) are eligible for a seamless failover.

As shown in FIG. 5, system 500 includes client driver 501, replicationcommunication paths 502; database group A 504 (including members (hereinalso known as “mem”) mem A1 506, mem A2 508, mem A3 510); database groupB 512 (including mem B1 514, mem B2 516, mem B3 518); and database groupC 520 (consisting of mem C1 522, mem C2 524, mem C3 526).

One embodiment of the present invention recognizes that within adatabase group or cluster, the database group members (such as mem A1506, mem A2 508 and mem A3 510) are chosen according to existingworkload balancing algorithms. The algorithm processing starts when thesoftware client driver 501 connects the database clusters 504, 512, and520 using replication communication paths 502 to the DBMS. Initiallygroup A 504 is configured as the primary cluster. Group B 512 and groupC 520 are specified as alternate clusters, in that order. Client driver501 establishes the initial connection to mem A1 506 within group A 504.If group A 504 goes down (that is, none of the members in group A 504are available), then group C 520 takes over the primary role. Thisprocess is described in the following paragraph.

Group A 504 switches to a standby role. Programming continues where anSQL operation is issued on the existing connection. The connection triesto acquire a socket transport to group A 504 mem A1 506, and receives anerror stating that it is no longer the primary database. Client driver501 gives up group A 504 immediately and moves to group B 512. Thesoftware attempts to connect to group B mem B1 514, and gets an errorsaying this group is not the primary database. Client driver 501 skipsgroup B 512 and moves to group C 520. Client driver 501 then attemptsconnecting to mem C1 522 within group C 520. The software makes theconnection, and the data is now routed to group C 520 mem C1.

Some embodiments of the present invention may further recognize adatabase client driver that can detect the switch of roles betweenclusters and seamlessly failover the connected applications to the newprimary cluster without an application outage. In the event a failedprimary cluster comes back up and re-acquires the role as the primarydatabase, client connections will seamlessly failback to the new primarycluster.

Some embodiments of the present invention may further include one, ormore, of the following features, characteristics and/or advantages,where the dynamic server switching system maintains a: (i) list in eachclient which identifies the primary server for that client; (ii) list ineach client which identifies the preferred communication method; (iii)hierarchy of successively secondary servers; and/or (iv) hierarchy ofcommunication method pairs.

Some embodiments of the present invention may further recognize that inthe event the client does not have requests served by the designatedprimary server (or the designated communication method) the systemtraverses the list to ascertain the identity of the first availablealternate server communication method pair, where the client uses thisretrieved data to initiate future requests. The client periodicallytests the primary server communication method pair to determine whetherthe fault has been cleared. If so, the client re-establishes theoriginally selected primary server communication method pair as therequest route.

Some embodiments of the present invention may further include one, ormore, of the following features, characteristics and/or advantages: (i)failover and failback is client based (in a clustered databaseenvironment) where a group of database servers is organized as acluster; (ii) provides a database configuration tailored where manydatabase servers are partitioned into clusters; (iii) provides oneprimary database cluster that contains preferred database servers; (iv)provides several alternate database clusters that act as secondaryclusters; (v) uses improved client-server communication without manualintervention; (vi) the client moves connections to whichever clusterassumes the role of the primary cluster (in a dynamic clusterenvironment) where the server cluster can switch between primary andsecondary roles depending upon availability; (vii) provides a method ofdetecting whether the primary cluster has come back up to initiatefailback; and/or (viii) no polling is needed by the software to detectwhether the original primary cluster has come back up.

Some embodiments of the present invention may further include one, ormore, of the following features, characteristics and/or advantages: (i)the existence of a specialized client server protocol (for the currentlyconnected primary cluster) to notify the client of the change in itsrole to secondary cluster when another cluster assumes the role as theprimary cluster; (ii) routing of subsequent connections from the clientto the new primary cluster; and/or (iii) the client server does not needto poll the designated primary server to detect revival. Further withregard to item (i), when the original primary cluster comes “back up” isa typical example of a specialized client server protocol used to notifythe client of the change in its role.

Some embodiments of the present invention may further include one, ormore, of the following features, characteristics and/or advantages: (i)group IP (internet protocol) addresses are dedicated to each cluster;(ii) external monitor software is not needed to detect node failures;(iii) no need for external monitor software to move the mobile IParound; (iv) IP addresses of alternate groups are configured at theclient where the client itself can detect the failure of a connectedprimary database through information returned by the database server;(v) IP addresses of alternate groups retry the configured groups untilthe primary group is found; (vi) manages failovers between existingclusters of databases; (vii) manages how databases are added to clusterswithout affecting availability; (viii) applications are configured withthe knowledge of high availability alternate clusters; (ix) usingconfiguration, performs automatic failover of clients connected to agroup of clusters (on the same platform) when the cluster loses theprimary role; (x) using improved client-server communication, performsautomatic failover of clients connected to a group of clusters (on thesame platform) when the cluster loses the primary role; and/or (xi)performs communication between the client and the server.

Some embodiments of the present invention may further include one, ormore, of the following features, characteristics and/or advantages: (i)manages failovers using direct connectivity between the client and theservers (where the software state between the database connections andfailover is maintained rather than by using an intermediate proxyserver); (ii) manages advanced communication between the end user clientapplications using backend databases to detect failures; (iii) enablesfailover of the client to the cluster that takes over the primarydatabase role; (iv) works with pre-existing technology to implement highavailability clusters on already available servers (such as a cluster ofmainframes acting as a single system); (v) works with pre-existingtechnology to implement high availability clusters on already availableservers (such as clustering technology that helps deliver high databaseavailability); and/or (vi) enables automatic seamless failover betweenclusters without the need for any additional software. Further withregard to item (vi), a typical example of additional software that isnot needed is proxy software, where pre-existing technology for enablingautomatic cluster failover is already built into the end client databasedriver code and/or the database itself.

IV. Definitions

Present invention: should not be taken as an absolute indication thatthe subject matter described by the term “present invention” is coveredby either the claims as they are filed, or by the claims that mayeventually issue after patent prosecution; while the term “presentinvention” is used to help the reader to get a general feel for whichdisclosures herein that are believed as maybe being new, thisunderstanding, as indicated by use of the term “present invention,” istentative and provisional and subject to change over the course ofpatent prosecution as relevant information is developed and as theclaims are potentially amended.

Embodiment: see definition of “present invention” above—similar cautionsapply to the term “embodiment.”

and/or: inclusive or; for example, A, B “and/or” C means that at leastone of A or B or C is true and applicable.

Module/Sub-Module: any set of hardware, firmware and/or software thatoperatively works to do some kind of function, without regard to whetherthe module is: (i) in a single local proximity; (ii) distributed over awide area; (iii) in a single proximity within a larger piece of softwarecode; (iv) located within a single piece of software code; (v) locatedin a single storage device, memory or medium; (vi) mechanicallyconnected; (vii) electrically connected; and/or (viii) connected in datacommunication.

Computer: any device with significant data processing and/or machinereadable instruction reading capabilities including, but not limited to:desktop computers, mainframe computers, laptop computers,field-programmable gate array (fpga) based devices, smart phones,personal digital assistants (PDAs), body-mounted or inserted computers,embedded device style computers, application-specific integrated circuit(ASIC) based devices.

Seamlessly moving: Seamless means moving to another member withoutthrowing an error back to the application and without any manualintervention.

What is claimed is:
 1. A computer program product comprising a computerreadable storage medium having stored thereon: first programinstructions programmed to operate a database management system (DBMS)including an initial primary cluster and a plurality of standbyclusters; second program instructions programmed to communicate to a setof client driver(s) connecting a first application to the initialprimary cluster an identity of the plurality of standby clusters; thirdprogram instructions programmed to identify one or more internetprotocol addresses, the one or more internet protocol addresses beingassociated with each of the plurality of standby clusters; fourthprogram instructions to select a standby cluster of the plurality ofstandby clusters as the primary standby cluster; fifth programinstructions programmed to in response to the initial primary clusterbeing unavailable and subsequent to the selection of the primary standbycluster, assign the primary standby cluster as a new primary cluster inplace of the initial primary cluster; and sixth program instructionsprogrammed to in response to assignment of the new primary cluster,direct that a software state between the first application and theinitial primary cluster be saved; determine which of the one or moreinternet protocol addresses is associated with the new primary cluster;direct the set of client driver(s) to communicate with the internetprotocol addresses associated with the new primary cluster; select,based on a balancing load algorithm and an automatic failover technique,a member of the new primary cluster; fail over a first in-progresstransaction of the first application to the selected member of the newprimary cluster, in response to a predetermined set of requirement(s)for a safe failover being satisfied; notify, via an interface, a managerof the DBMS of the assignment to the new primary cluster; and prompt themanager of the DBMS whether to manually change from the new primarycluster to a different primary cluster.
 2. The product of claim 1wherein: the seamless movement of the first application occurs withoutany human intervention at all.
 3. The product of claim 1 wherein: theseamless movement of the first application avoids an outage that wouldoccur absent the seamless movement.
 4. The product of claim 1 whereinthe medium has further stored thereon: seventh program instructionsprogrammed to determine that the initial primary cluster has becomeunavailable by determining that no members of a plurality of members ofthe initial first cluster are available for use by the DBMS.
 5. Theproduct of claim 1 wherein the medium has further stored thereon:seventh program instructions programmed to cycle, by a first clientdriver of the set of client driver(s), through the standby clusters ofthe plurality of standby clusters without returning a connection failureto the first application; and eighth program instructions programmed tofind a standby cluster of the plurality of standby clusters that hasbeen assigned as the new primary cluster.
 6. The product of claim 1wherein the medium has further stored thereon: seventh programinstructions programmed to on condition a predetermined set ofrequirement(s) for a safe failover are satisfied, failover a firstin-progress transaction of the first application to a selected member ofthe new primary cluster.
 7. The product of claim 6 wherein: thepredetermined set of requirements for a safe failover include thatfollowing: existence of a Structured Query Language (SQL) query at aunit of work boundary.
 8. The product of claim 1 wherein the fifthprogram instructions are further programmed to seamlessly move the firstapplication without any substantial human intervention.
 9. A computersystem comprising: a processor(s) set; and a computer readable storagemedium; wherein: the processor set is structured, located, connectedand/or programmed to run program instructions stored on the computerreadable storage medium; and the program instructions include: firstprogram instructions programmed to operate a database management system(DBMS) including an initial primary cluster and a plurality of standby,clusters; second program instructions programmed to communicate to a setof client driver(s) connecting a first application to the initialprimary cluster an identity of the plurality of standby clusters; thirdprogram instructions programmed to identify one or more internetprotocol addresses, the one or more internet protocol addresses beingassociated with each of the plurality of standby clusters; fourthprogram instructions programmed to determine whether one or morerequests to the initial primary server by the set of client driver(s)are not served; fifth program instructions programmed to in response todetermining that the one or more requests to the initial primary serverby the set of client driver(s) are not served, assign a selected standbycluster of the plurality of standby clusters to be assigned as a newprimary cluster in place of the initial primary cluster; and sixthprogram instructions programmed to in response to assignment of the newprimary cluster, direct that a software state between the firstapplication and the initial primary cluster be saved; determine which ofthe one or more internet protocol addresses is associated with the newprimary cluster; direct the set of client driver(s) to communicate withthe internet protocol addresses associated with the new primary cluster;select, based on a balancing load algorithm and an automatic failovertechnique, a member of the new primary cluster; fail over a firstin-progress transaction of the first application to the selected memberof the new primary cluster, in response to a predetermined set ofrequirement(s) for a safe failover being satisfied; notify, via aninterface, a manager of the DBMS of the assignment to the new primarycluster; and prompt the manager of the DBMS whether to manually changefrom the new primary cluster to a different primary cluster.
 10. Thesystem of claim 9 wherein the program instructions further include:seventh instructions programmed to periodically check whether theinitial primary server has become available; and eighth instructionsprogrammed to in response to determining that the initial primary serverhas become available, direct the set of client driver(s) to route futurerequests to the initial primary server.
 11. The system of claim 9wherein: directing the set of client driver(s) to route future requeststo the new primary cluster avoids an outage that would occur absent theseamless movement.
 12. The system of claim 9 wherein the sixth programinstructions are further programmed to direct the set of clientdriver(s) to route future requests to the new primary cluster such thatthe first application is seamlessly moved to the new primary clusterwithout any substantial human intervention.